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(57) ABSTRACT 

An automatic method for rating data files for objectionable 
content in a distributed computer system includes prepro- 
cessing the file to create semantic units, comparing the 
semantic units with a rating repository containing entries 
and associated ratings, assigning content rating vectors to 
the semantic units, and creating a modified data file incor- 
porating rating information derived from the content rating 
vectors. For text files, the semantic units are words or 
phrases, and the rating repository also contains words or 
phrases with corresponding content rating vectors. For audio 
files, the file is first converted to a text file using voice 
recognition software. For image files, image processing 
software is used to recognize individual objects and compare 
them to basic images and ratings stored in the rating reposi- 
tory. In one embodiment, a composite content rating vector 
is derived for the file from the individual content rating 
vectors, and the composite content rating vector is incorpo- 
rated into the modified file. In an alternate embodiment, 
semantic units with content rating vectors exceeding preset 
user limit values of objectionable content are blocked out by 
display blocks or, for audio, audio blanking signals, for 
example, beeps. The user can then view or hear the remain- 
ing portions of the file. The invention can be used with any 
type of data file that can be divided into semantic units, and 
can be implemented in a server, client, search engine, or 
proxy server. 

44 Claims, 9 Drawing Sheets 
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AUTOMATIC RATING AND FILTERING OF rating. At its Web site (http://vsww.rsac.org), RSAC provides 

DATA FILES FOR OBJECTIONABLE a detailed questionnaire for providing the rating, but the user 

CONTENT can easily override or adjust the results. 

Finally, there is currently no way to rate dynamically 

FIELD OF THE INVENTION 5 createc * documents. For example, search engines receive a 

user query, find applicable documents, and create a search 

This invention relates generally to methods for rating data result page ]isting a mmbcT of the located documents< ^ 

for objectionable content. More particularly, it relates to search result page typically includes a title and short abstract 

methods for automatically rating and filtering objectionable Qf extfactj akmg witfa the URL? for each TCiriGved document . 

data on Web pages. 10 resillt page itself mignt nave objectionable content, and 

BACKGROUND ART currently the only way to address this problem is for 

tt,e astronomical growth of the World Wide Web in the browsers not to display search result pages at all. Without 

last decade has put a wide variety of information at the search eng.nes, though, internet research is stgn.ftcantly 

fingertips of anyone with access to a computer connected to - limited. 

the internet. In particular, parents and teachers have found A further problem with all of the above solutions, as well 

the internet to be a rich educational tool for children, as w i lh word-screening or phrasescreening systems, is that 

allowing them to conduct research that would in the past they either allow or deny access to Web pages. Even if only 

have either been impossible or taken far too long to be a sma11 portion of the document is objectionable, the user is 

feasible. In addition to valuable information, however, chil- prohibited from seeing the entire document. This is espe- 

dren also have access to offensive or inappropriate cially significant in search result pages, in which one offen- 

information, including violence, pornography, and hate- sive site prevents display of all of other unrelated sites, 

motivated speech. Because the World Wide Web is inher- The situation becomes even more complex when Web 

ently a forum for unrestricted content from any source, pages include no n -text data, for example, audio or images, 

censoring material that some find objectionable is an unac- ^ Surrounding text does not always indicate the content of the 

cep table solution. embedded file, allowing offensive audio or image material to 

Voluntary user-based solutions have been developed for slip through the ratings system. Occasionally, people delib- 

implementation with a Web browser on a client computer. erately mislabel offensive audio or image files in order to 

The browser determines whether or not to display a docu- mislead monitoring services. 

ment by applying a set of user-specified criteria. For 30 There is a need, therefore, for an automatic rating method 
example, the browser may have access to a list of excluded for all material available on the World Wide Web, including 
sites or included sites, provided by a commercial service or dynamically created material, that allows greater viewer 
a parent or educator. Users can also choose to receive control over what material is displayed or blocked, 
documents only through a Web proxy server, which com- 
pares the requested document with an exclusion or inclusion 35 OBJECTS AND ADVANTAGES 
fist before sending it to the client computer. Because new ...... . L . - . ■ 

content is continually being added to the World Wide Web, Accordingly, it is a primary object of the present invention 

however, it is virtually impossible to maintain a current list t0 P r °7 de a m ? thod for f ^aUcally rating a data file, for 

of inappropriate sites. Limiting the user to a list of included exam P le ' a Web for objectionable content, 

sites might be appropriate for corporate environments, but ^ It is an additional object of the invention to provide an 

not for educational ones in which the internet is used for objective rating method that requires no subjective human 

research purposes. input after the system is initially devised. 

The Recreational Software Advisory Council (RSAC) has It is a further object of the present invention to provide a 

developed an objective content rating labeling system for method for automatically rating dynamically created docu- 

Web sites, called RSAC on the Internet (RSACi). The 45 ments as they are being created. 

system produces ratings tags that are compliant with the It is a yet another object of the present invention to 

Platform for Internet Content Selection (PICS) tag system provide a rating and filtering method that blocks objection- 

already in place, and that can easily be incorporated into able content of a file while allowing access to remaining 

existing HTML documents. The RSACi labels rate content inoffensive portions of the file. 

on a scale of zero to four in four categories: violence, nudity, 50 It is aD additional object of the present invention to 

sex, and language. Current Web browsers are designed to prov ide a method that can be used with any type of data file, 

read the RSACi tags and determine whether or not to display including text, audio, and image. 

the document based on content levels the user sets for each ]t . & a tQ ide a m£thod fof faU afld 

of the our categories. The user can also set the browser not fi ^ files ^ ^ b£ . lemented on a client; 

to display pages without a rating. 5 5 serV er, or proxy server, and can therefore be easily incor- 

While a good beginning, there are three significant hmi- ted mtQ existi tcm architectureSt 

tations to the RSACi rating system. First, it is a voluntary „ „ . . __. ^ . 4 . 

system and is effective only if widely implemented. There is lt 15 an ob J e * <* J 6 P rese "' "^vention to prov.de 

somewhat of an incentive for the site creator to assign a anautomaUc rating method that works w. h exiting manual 

rating, even if a zero rating, because some users choose not 60 ratlng m6thods and requires minmal sys,em changes - 

to display sites without a rating. If the site's creator does not SUMMARY 
include a rating, it can be generated by an outside source. 

However, the rate at which content is being added to the Web These objects and advantages are attained by a computer- 
makes it virtually impossible for a third party to rate every implemented method for rating a raw data file for objec- 
new Web site manually. 65 tionable content. The method occurs in a distributed corn- 
Second, while the RSACi rating aims to be objective, it is puter system and comprises the steps of preprocessing the 
subject to some amount of discretion of the person doing the raw data file to create semantic units representative of the 
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semantic content of the raw data file, comparing the seman- 
tic units with a rating repository comprising semantic entries 
and corresponding ratings, assigning content rating vectors 
to the semantic units, and creating a modified data file 
incorporating rating information derived from the content 5 
rating vectors. After the modified data file is created, either 
all, some, or none of the file will be displayed by a browser 
to a user at a client computer. 

The method works with any type of data file that can be 
converted to semantic units. Embodiments of the prepro- 1Q 
cessing step vary with the type of raw data file to be rated. 
In one embodiment, a text-only HTMLdocument is stripped 
of its tags and is then parsed into semantic units, for 
example, words or phrases. In an alternate embodiment, the 
data file is an audio file, and text data is created from the 15 
audio file using standard voice recognition software. The 
system also creates an audio-to-text correlation between a 
location in the created text data and a corresponding location 
in the audio file. The text file is then parsed into semantic 
units. In a further embodiment, image processing software is 20 
used to identify semantic units within an image file. The 
semantic units of an image file are discrete objects in regions 
within the image file. 

The rating repository used depends on the type of file and 
related semantic units. For text files, the repository contains 25 
entries of words or phrases with corresponding content 
rating vectors. Each word entry in the repository may have 
numerous associated content rating vectors for different 
contexts in which the word is used, determined by surround- 
ing words in the text. Audio files use a similar rating 30 
repository, but may include additional entries for sounds. 
The entries for image files are discrete objects that can be 
identified by the image processing software. Each discrete 
object has one or more content rating vectors associated with 
it. To assign content rating vectors to semantic units, the 35 
system first searches the rating repository for an entry 
equivalent to the semantic unit. If it finds no such entry, it 
assigns the semantic unit a zero content rating vector. If it 
does find an entry, it assigns the semantic unit the entry's 
corresponding content rating vector. If the entry has numer- 40 
ous content rating vectors, it analyzes surrounding semantic 
units to determine the appropriate context before assigning 
a content rating vector. 

In a first preferred embodiment of the invention, a com- 
posite content rating vector, comprising a set of components, 45 
is derived from the content rating vectors. Each component 
of the composite content rating vector is derived from 
corresponding components of the content rating vectors. In 
one embodiment, each component of the composite content 
rating vector is a weighted average of the corresponding 50 
components of the content rating vectors, wherein the 
weighted average uses weighting factors related to the value 
of the components of the content rating vectors. In an 
alternate embodiment, each component of the composite 
content rating vector is equal to a selected value of the 55 
corresponding components of the content rating vectors. The 
selected value is the highest of the corresponding compo- 
nents and has at least a predetermined minimum number of 
occurrences. Many other methods for deriving the compos- 
ite content rating vector can be used. The composite content 60 
rating vector is combined with the raw data file to produce 
a modified data file containing the composite content rating 
vector. 

In a second preferred embodiment, termed filtering, the 
content rating vectors are compared with preset user limit 65 
values that define objectionable content rating vectors to 
identify objectionable semantic units. Objectionable content 
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corresponding to the identified objectionable semantic units 
are then replaced by display blocks in a copy of the raw data 
file to produce a modified date file. Filtering can be per- 
formed on files including text, audio, or image. In a text-only 
data file, objectionable words or phrases are replaced with, 
for example, spaces, black rectangles, or a predetermined 
phrase. In an audio file, objectionable portions that corre- 
spond to the objectionable semantic units are located using 
the audio -to- text correlation. The objectionable portions are 
replaced with audio blanking signals, for example a tone or 
silent space, in a copy of the audio file to produce a modified 
audio file. Similarly, objectionable discrete objects of image 
files are identified by comparing content rating vectors with 
present user limit values. Content corresponding to the 
objectionable discrete object is replaced by image blocks, 
which may be black rectangles or blurred regions. In an 
alternate embodiment of the invention, after the objection- 
able content is replaced, the system derives a modified 
composite content rating vector for the modified data file 
from a modified set of content rating vectors. The modified 
set of content rating vectors does not contain content rating 
vectors corresponding to the objectionable semantic units. 

The method can be implemented using many different 
architectures. In all architectures, the raw data file is stored 
in a server and the preset user limit values are stored in a 
client. All embodiments of the method can be implemented 
in a server, proxy server, or client. As is necessary, the server 
or proxy server obtains the preset user limit values from the 
client, and the proxy server and client obtain the raw data file 
from the server. 

BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 is a block diagram of the rating method of the 
present invention. 

FIG. 2A is a schematic diagram of a raw data file and 
modified data file produced by composite rating. 

FIG. 2B is a schematic diagram of a raw data file and 
modified data file produced by filtering. 

FIG. 2C is a schematic diagram of a raw data file and 
modified data file produced by composite rating and filter- 
ing. 

FIG. 3 is a schematic diagram of a preprocessing step for 
an audio file. 

FIG. 4A is a schematic diagram of a system architecture 
implementing the present invention in a server. 

FIG. 4B is a schematic diagram of a system architecture 
implementing the present invention in a proxy server, 

FIG. 4C is a schematic diagram of a system architecture 
implementing the present invention in a client. 

FIG. 4D is a schematic diagram of a system architecture 
in which a search engine implements the present invention. 

FIG. 4E is a schematic diagram of a system architecture 
in which a search engine filters a search result page. 

DETAILED DESCRIPTION 

Although the following detailed description contains 
many specifics for the purposes of illustration, anyone of 
ordinary skill in the art will appreciate that many variations 
and alterations to the following details are within the scope 
of the invention. Accordingly, the following preferred 
embodiment of the invention is set forth without any loss of 
generality to, and without imposing limitations upon, the 
claimed invention. 

A block diagram illustrating the operation of a preferred 
embodiment of the present invention is shown in FIG. 1. The 
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method is typically carried out within a distributed computer 
system and includes a series of steps for automatically rating 
a raw data file for objectionable content. The rating can be 
used to derive an overall content rating for the file, or to 
selectively filter content from the document. In the first step, 5 
a raw data file 10 is preprocessed to generate semantic units 
12, which can be words, phrases, parts of an image, or other 
such units representative of the semantic content of raw data 
file 10. Semantic units 12 are then compared with a rating 
repository 14, which contains entries related to the semantic 1Q 
units and content rating vectors (CRVs) associated with each 
entry. Content rating vectors 16 are assigned to the semantic 
units based on the comparison, and, in the final step, the 
system creates a modified data file 18 incorporating infor- 
mation derived from CRVs 16. Modified data file 18 can be J5 
an additional file created from a copy of raw data file 10, or 
it can be created from and replace the raw data file. Thus, the 
method includes a preprocessing step 20, a comparison step 
22, an assigning step, and a modified file creation step 24. 

The raw data file can be a file in any database, but in the 2Q 
preferred embodiment, it is a hypermedia file such as HTML 
text, a sound file, or an image file. Preprocessing step 20 
varies with the type of file. For an HTML text file, the text 
is parsed into individual words or phrases using methods 
known in the art. Any tags or document meta -information, ^ 
which are not displayed to the user, are ignored when the 
semantic units are created. 

Different embodiments of a modified text data file formed 
in modified file creation step 24 are displayed in FIGS. 
2A-2C. In FIG. 2A, a raw data file 26 is combined with a 30 
composite content rating vector (CCRV) 30 for the file to 
create a modified data file 28. CCRV 30 is derived from the 
CRVs for each semantic unit. Specifically, CCRV 30 com- 
prises a set of components, and each component is derived 
from corresponding components of the CRVs. CCRV 30 is 35 
added to the document as is currently done for manual 
CRVs — it is contained in a standard PICS tag 32 for docu- 
ment meta -information that is inserted into the header of an 
HTML document. A browser or server then extracts CCRV 
30 from tag 32. Any reasonable method for deriving the 40 
CCRV may by used, and examples are discussed below. This 
embodiment of the method is called composite rating. 

In an alternate embodiment of the modified file creation 
step shown in FIG. 2B, termed filtering, the CRVs are used 
to block specific semantic units. Semantic units of raw data 45 
file 34 are compared with the rating repository to obtain 
CRVs. The system reads preset user limit values, or content 
settings, defining objectionable CRVs, and compares the 
CRVs with the preset user limit values to identify objec- 
tionable semantic units. If one component of a semantic 50 
unit's CRV is above the corresponding preset user limit 
value, the semantic unit is considered objectionable. Objec- 
tionable content corresponding to objectionable semantic 
units is replaced by a display block or placeholder 38 in 
modified file 36. For text files, display block 38 may be 5s 
spaces, a black rectangle, or a phrase indicating the type of 
content replaced, for example, "<offensive language>" or 
"<explicit sexual content>." Raw data file 34 is not altered; 
only modified file 36, which is created dynamically in 
response to the user limit values, is changed. The content eo 
settings are generally stored in a client browser. If the 
filtering method is performed in a different location in the 
distributed computer system, the browser either sends the 
settings or makes them accessible to the other computer. 

These two embodiments can be combined in a number of 65 
ways, depending upon where in the distributed system each 
step is performed. In the example of FIG. 2C, raw data file 
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40 first receives a CCRV 42, stored in first modified data file 
44. If CCRV 42 is above the user limit, in which case the 
browser does not display first modified file 44, first modified 
file 44 is then filtered using preset user limit values to 
produce second modified file 46 containing display blocks 
48. CRVs corresponding to the objectionable semantic units 
are deleted to form a modified set of CRVs, and a modified 
CCRV 50 is derived. CCRV 42 is replaced by modified 
CCRV 50 in second modified file 46 to produce third 
modified file 52. The browser extracts modified CCRV 50, 
which is necessarily below the preset user limit values, and 
displays third modified file 52. While not explicitly stated, 
various other permutations of composite rating and filtering 
will be obvious to someone skilled in the art upon reading 
this description, and are therefore included in the method of 
the present invention. 

Non-text data files are easily accommodated by alternate 
embodiments of the automatic rating method. Standard 
voice recognition software is used to preprocess audio files, 
as shown in FIG. 3. Voice recognition software is commonly 
available; one product is IBM ViaVoice. An audio file 54 is 
converted in voice recognition step 56 into text data 58. Text 
data 58 is then parsed in step 60 into semantic units 62, 
words or phrases, and treated as with the text files described 
above. During the preprocessing step, an audio-to-text cor- 
relation 64 between locations in the text file and correspond- 
ing locations in the audio file is created. Audio-to-text 
correlation 64 is needed to filter audio file 54 (not shown). 
Objectionable portions of the audio file corresponding to 
objectionable semantic units, identified in a comparison of 
CRVs with preset user limit values, are located using audio- 
to-text correlation 64. Just as words or phrases are blocked 
out of a text file, portions of the audio file containing 
objectionable words or phrases can be replaced with audio 
blanking signals to create a modified audio file. These audio 
blanking signals can be audio tones, beeps, silent portions, 
or spoken phrases describing the missing material. Removed 
portions do not necessarily need to be words. Audio files can 
also contain sexual or violent sounds, for example, heavy 
breathing or gunshots. As audio processing technology 
develops and more sounds can be identified, the sounds can 
be similarly removed from the modified audio files. The 
semantic units relating to the sounds might be descriptive 
words or codes that are also included in the repository 
database. 

In an alternate embodiment for image files, image pro- 
cessing software is used in the preprocessing step to recog- 
nize discrete objects in regions within an image file. These 
discrete objects are the semantic units, which are then 
assigned content rating vectors. Software systems use tech- 
niques known in the art, including filters, shape-based 
indexing, and matching using Daubechies* Wavelets, to 
identify the discrete objects. The repository stores basic 
images of discrete objects that can be recognized by these 
software systems. In the filtering embodiment of the method, 
objectionable regions of the image file are replaced by image 
blocks, which may be black rectangles. The image blocks 
can also be formed by blurring regions of the file to make 
them unrecognizable. The method of the present invention 
can be used to rate or filter any type of raw data file, 
including multimedia files. Appropriate semantic units and 
rating repositories can be easily determined by those skilled 
in the art. 

Any content rating scheme may be used for devising 
CRVs, depending upon the type of information the user 
wishes to be alerted of. The preferred embodiment uses the 
RSAC on the Internet (RSACi) system developed by the 
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Recreational Software Advisory Council (RSAC), available 4. A slight modification of this embodiment counts an 

at the RSAC Web site at http://www.rsac.org. This standard occurrence when the number or a higher number occurs. For 

has already been developed and is supported by most Web example, 2 or higher has three occurrences. For a predeter- 

browsers. The RSACi system provides content ratings on a mined minimum number of occurrences of two, 3 or higher 

scale of zero to four in four categories: nudity, sex, violence, 5 ^ mc highest corresponding component of the CRVs meet- 

and language. Each category is a component of the CRV; a mg m fs requirement, and the entire file receives a composite 

typical CRV is (N 1 S 0 V 2 L 3). In the preferred language rating of 3. In an additional modification, the 

embodiment, the present invention assigns an RSACi CRV predetermined minimum number of occurrences can vary 

to each semantic unit. f or cacn value. For example, a predetermined minimum 

One goal of the present invention is to create an objective 10 num ber of occurrences of one for component 4 causes a 

ratings system. While no system is completely objective, the composite rating of 4, but if 3 is the highest rating, it must 

present method aims to place the subjectivity in the hands of occur m two of the occurrences for the file to receive a 

the parent selecting the allowable levels. For example, an composite rating of 3, The predetermined minimum number 

RSACi language level of two corresponds to "moderate 0 f occurrences is more likely a percentage of the total 

expletives or profanity/' Based on the RSACi definition of 15 number of CRVs. Appropriate rules can be developed 

moderate, the parent sets the browser ratings accordingly. In through standard statistical analyses comparing manually 

the present invention, the objectivity is implemented in the derived CRVs for an entire file with CRVs for the file's 

ratings repository and its use in assigning CRVs to semantic semantic units. 

units. The ratings repository is created by a person who Additional embodiments of the invention correspond to 

selects the entries and defines an associated CRV for each 20 implementations of the metnod in different components of 

entry. The entries can be words, phrases, sounds, or images, ^ ^fo^ computer system. The following examples 

and are correlated with the algorithm used to assign the are irjtended t0 illustrate, but not limit, potential embodi- 

rating. Some words are objectionable only in certain m£nts of the presenl inveDtiorj . 
contexts, and their entry in the repository can include ratings 

for various contexts. For example, consider the word "stab/' 25 EXAMPLE 1 
When used in the phrase "take a stab at it/' the word is 

harmless and receives a violence rating of zero. However, it SERVER COMPOSITE RATES 

can also be used in an explicitly violent passage to describe , . 

. . . 4 . 1 ■/ r iu i As shown in FIG. 4A, a server 66 implements a composite 

one person stabbing another with a kmre. In that case, the . . ^ . f / 

J , ■ u- u • - 1 ,a a m *„u* ratine step 68 in batch mode. Periodically, it searches tor and 

word or phrase in which it is included might receive a 30 B * /> ppm/ - 

violence rating of three for "aggressive violence or death to fi «f an unrated raw data file 70 and denves a CCRV 72 1 for 

humans." Other words are mild when used alone, but »> ba fd on a rating repository 74 1 either within server -66 or 

■ rr ■ • * ■ _u* * • „u ™«„ in a different computer. It then adds a rating tag to the nle 

become offensive in certain combinations, which may not , c , m, • t - B 

m , , j , u „ n^^A— «k™i„» to create a modified file 76. When it receives a request 78 

necessarily be standard phrases. Consider the words body, , £ , j ct 

ut i » i j «vi • ■ *u • u- from a client browser 80, server 66 sends modified nle 76. 

"hot/' and "lick/' One can imagine their use in pornographic 35 . . o^, 

. 4 . . u- *• a *u w rtf In comparison step 82, client browser 80 compares CCRV 

writing in various combinations, and the close proximity oi * / 9 . r , , 

the three words necessitates a high rating in the sex category. ™ ^ < hc P"* 6 lunit v f u . es 0 de ermme whether to 

However, proximity is not always enough to determine the dls P la y ffle > stc P 84 > 01 not dls P la y ll > st6 P 86 

rating. Consider the following sentence: "It was a very hot EXAMPLE 2 

day, so every body got an ice cream cone to lick/' In this 40 

example, the words receive a zero sex rating, which may be PROXY SERVER COMPOSITE RATES 
determined by the use of "hot" to modify "day," or the 

presence of "ice cream cone." For each word entry, Referring to FIG. 4B, a client browser 88 accesses the 
therefore, the repository might include a basic rating, a list internet through a proxy server 90 that stores the preset user 
of phrases in which the word can occur, with corresponding 45 limit values. When the user sends a request 91 for a raw data 
ratings for the phrase, or a list of words in the surrounding file 92 stored in server 94, proxy server 90 performs corn- 
text that determine the appropriate rating for the word. parison step 96 using rating repository 98 to calculate a 

For a given rating repository, there are numerous methods CCRV 100 and create a modified data file 102. CCRV 100 
for deriving the components of a CCRV from corresponding is compared in step 104 with the stored user limit values, 
components of the CRVs for each semantic unit. Consider a 50 Depending on the result, proxy server 90 either 106 sends 
small file with only ten semantic units. One of the content the file or 108 does not send the file, instead sending a 
rating categories, language, has the following corresponding replacement document explaining why access was denied, 
components of the CRVs: (0, 1, 0, 0, 1, 2, 0, 0, 3, 4). The EXAMPLE 3 
average of these number is 1.1, clearly not a reasonable ^ 
language component of the CCRV. In one embodiment, each 55 BATCH RATING 
component of the CCRV is a weighted average of corre- 
sponding components of CRVs, in which corresponding RSAC or another organization implements the current 
components of CRVs are multiplied by weighting factors invention on a server. The RSAC server visits other servers, 
relating to values of the components. The 4 in the example on its own initiative or in response to requests, rates all of 
above has the highest weighting factor, in order to skew the eo the documents, and inserts ratings tags into the documents, 
component of the CCRV much higher than the average. 

In another embodiment, each component of the CCRV is EXAMPLE 4 

equal to a selected value of the corresponding components CLIENT FILTERS 
of the CRVs. The selected value is the highest value that has 

at least a predetermined minimum of occurrences. If the 65 As shown in FIG. 4C, a client browser 110 requests a raw 

predetermined minimum number of occurrences is one, in data file 112 from a server 114. File 112 has either been rated 

the example above, the language component of the CCRV is as in Example 1 or not. Client browser 110 searches 116 for 
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a rating and compares 118 the rating with user limit values. creating a modified data file incorporating rating infor- 

If the CCRV is below the user limit values, the browser mation derived from said content rating vectors, 

displays the document, step 120. Otherwise, it filters 122 the wherein when said raw data file is an audio file and said 

document and displays 120 the resulting document. If the modified data file is a modified audio file, said prepro- 

document arrives from the server without a rating, the 5 cessing step further comprising the steps of: 

browser immediately filters 122 the document. *»™g a voice recognition system to create text data 

from said audio file; 

EXAMPLE 5 creating an audio -to -text correlation between a location 

in said text data and a corresponding location in said 

PROXY SERVER FILTERS 10 audio file i and 

parsing said text data into said semantic units. 

This embodiment is similar to Example 2, except that the 2. The computer-implemented method of claim 1 wherein 

proxy server uses the stored preset user limit values to filter said step of creating a modified data file comprises the steps 

the document, rather than just rate it. of: 

deriving a composite content rating vector for said raw 

EXAMPLE 6 15 data file from said content rating vectors; and 

combining said composite content rating vector with said 

SEARCH ENGINE COMPOSITE RATES raw data file to produce said modified data file. 

* ■ r-r^ ^ . .. , , 3. The computer-implemented method of claim 2 wherein 
As shown in FIG. 4D, a client browser 124 sends a search ^ ^ content rating vector comprises a set of 

query 126 to a search engine 128. To perform search step 20 onents> wherein each i n said set of com- 

130, search engine 128 retrieves the relevant documents ponents ^ derived from correspoQding components of said 

from its database 132 and creates a search result page 134, content rating vectors. 

to which it assigns a CCRV 136 in a comparison step 138 4 computer-implemented method of claim 3 wherein 

using a rating repository 140. CCRV 136 is added to search eacn component of said composite content rating vector is a 

result page 134 to create a modified search result page 142. 25 weighted average of said corresponding components of said 

Client browser 124 compares 144 modified search result content rating vectors, said weighted average including 

page 142 with preset user limit values 146, and then either weighting factors related to values of said corresponding 

does 148 or does not 150 display modified search result page components of said content rating vectors. 

142. Alternately, client browser 124 filters modified search 5. The computer-implemented method of claim 3 wherein 

result page 142. 30 each component of said composite content rating vector is 

equal to a selected value of said corresponding components 

EXAMPLE 7 of said content rating vectors, wherein said selected value is 

a highest of said corresponding components of said content 

SEARCH ENGINE FILTERS rating vectors and said selected value has at least a prede- 

Referring to FIG. 4E, client browser 152 sends present 35 tcrmined minimum number of occurrences. . 

user limit values or content settings 154 along with a search «■ The computer-implemented method of claim 2 wherein 

query 156 to a search engine 158. Search engine 158 said method occurs in a server ....... . 

performs a search 160 of its database 162 to create a search ?; The computer-implemented method of claim 2 wherein 

14 A£4 t * ice a « said raw data file is stored in a server and said method occurs 
result page 164. In step 166, it niters and composite rates 

page 164 based on content settings 154 and rating repository 40 in a proxy server. < u a t i • <* u • 

168. Search engine 158 adds a CCRV 170 to the filtered page . 8 ; ™ e computer-implemented method of claim 2 wherein 

to create a modified search result page 172 that it sends to said raw data file 15 stored in a server and said method 0CCUrs 

client browser 152. Because the filtering process is based on m a c i lcn ' . , . * . f 1 - * . • 

user limits 154, CCRV 170 is necessarily below user limits 7,16 computer-.mplemented method of claim 1 wherein 

154, and modified search result page 172 will be displayed 45 ^ <*ep of creating a modified data file comprises the steps 

in step 174. CCRV 170 is necessary because client browser 0 ' 

152 may be set not to display unrated pages. comparing said content rating vectors with preset user 

* i a 1 -ii j ■ tL * . „u u limit values to identify objectionable semantic units, 
It will be clear to one skilled in the art that the above , .« J v J , A „ . . 

, * , * . . . . * wherein said preset user limit values define objection- 
embodiment may be altered in many ways without departing . . r . „„j 
A , J c iL . / , e 50 able content rating vectors; and 
from the scope of the invention. Accordingly, the scope of . " u , . A . . tU 

the invention should be determined by the following claims re P' acM 8 f J ect, ° aab ' e , conten . '°"«pondmg «o the 

J identified obiectionable semantic units in a copy of said 

and their legal equivalents. , * . . ,. , .it. f ^ 

What is claimed is: raw .f 12 flle ™ lh ^P 1 ^ blocks to P roduce SMd 

„ * * * * . 4 . modified data file. 

1. In a distributed computer system, a computer- -„ ™ , , « ,t. j r i • n 

, t A .i j f ♦ ♦■ ;■ j . fll f „ 55 10. The computer-implemented method of claim 9 

implemented method for automatic rating a raw data file for , * C1 L . 

, T . . . . . t. j j - ci u wherein said raw data file is a file chosen from the group 

obiectionable content, wherein said raw data tile is a hyper- . , „ , . , . ° r 

• . ~. . . ci a- c i «i« l a consisting of text, audio, and image, 

media file, a text file, an audio file, or an image tile, said _ * ' . ' , * . , , , . n 

j • ■ »J ♦ p 11- The computer-implemented method of claim 9 

method co mpnsing the steps or: . r , t \ . , , . , ., 

r ° r wherein said raw data file is stored in a server and said 

preprocessing said raw data file to create semantic units fi0 melhod QCC0XS [n a client 

representative of semantic contents of said raw data 12 The computer .i m pie m ented method of claim 9 

fik' wherein said preset user limit values are stored in a client 

comparing said semantic units with a content rating anc j said method occurs in a server. 

repository comprising semantic entries and correspond- 13. The computer-implemented method of claim 9 

ing content ratings; 65 wherein said preset user limit values are stored in a client, 

assigning content rating vectors to said semantic units said raw data file is stored in a server, and said method 

based on said comparing step; and occurs in a proxy server. 
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14. The computer-implemented method of claim 9, 
wherein said step of creating a modified data file further 
comprises the steps of: 

deriving a modified composite content rating vector for 
said modified data file from a modified set of content 5 
rating vectors, wherein said modified set of content 
rating vectors does not contain content rating vectors 
corresponding to said objectionable semantic units; and 

storing said modified composite content rating vector in 
said modified data file. 10 

15. The computer-implemented method of claim 14 
wherein said preset user limit values are stored in a client 
and said method occurs in a server. 

16. The computer-implemented method of claim 14 
wherein said preset user limit values are stored in a client, 15 
said raw data file is stored in a server, and said method 
occurs in a proxy server. 

17. The computer-implemented method of claim 1 
wherein said step of creating a modified audio file comprises 
the steps of: 20 

comparing said content rating vectors with preset user 
limit values to identify objectionable semantic units, 
wherein said preset user limit values define objection- 
able content rating vectors; 

using said audio-to-text correlation to locate objection- 
able portions of said audio file corresponding to the 
identified objectionable semantic units; and 

replacing said objectionable portions in a copy of said 
audio file with audio blanking signals to produce said 30 
modified audio file. 

18. The computer-implemented method of claim 17 
wherein said audio file is stored in a server and said method 
occurs in a client. 

19. The computer-implemented method of claim 17 35 
wherein said preset user limit values are stored in a client 
and said method occurs in a server. 

20. The computer-implemented method of claim 17 
wherein said preset user limit values are stored in a client, 
said audio file is stored in a server, and said method occurs 40 
in a proxy server. 

21. The computer-implemented method of claim 1 
wherein said raw data file is an image file, said modified data 
file is a modified image file, said semantic units are discrete 
objects in regions within said image file, and said prepro- 45 
cessing step is performed by an image processing system. 

22. The computer-implemented method of claim 21 
wherein said step of creating a modified image file com- 
prises the steps of: 

comparing said content rating vectors with preset user 50 
limit values to identify objectionable discrete objects, 
wherein said preset user limit values define objection- 
able content rating vectors; and 

replacing objectionable content corresponding to the 
identified objectionable discrete objects in a copy of 55 
said image file with image blocks to produce said 
modified image file. 

23. The computer- implemented method of claim 22 
wherein said image file is stored in a server and said method 
occurs in a client. 60 

24. The computer-implemented method of claim 22 
wherein said preset user limit values are stored in a client 
and said method occurs in a server. 

25. The computer-implemented method of claim 22 
wherein said preset user limit values are stored in a client, 65 
said image file is stored in a server, and said method occurs 

in a proxy server. 
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26. A method for automatic rating and filtering in a 
network environment a raw data file for objectionable 
content, wherein said raw data file is a hypermedia file, a text 
file, an audio file, or an image file, said method comprising 
the steps of: 

preprocessing said raw data file to create semantic units 
representative of semantic contents of said raw data 
file, wherein 

if said raw data file is an audio file said preprocessing 

step farther comprises the steps of: 

using a voice recognition system to create text data 
from said audio file; 

creating an audio-to-text correlation between a loca- 
tion in said text data and a corresponding location 
in said audio file; and 

parsing said text data into said semantic units; and 
wherein 

if said raw data file is an image file said semantic units are 
discrete objects in regions within said image file and 
said preprocessing step is performed by an image 
processing system; 

comparing said semantic units with a content rating 
repository comprising semantic entries and correspond- 
ing content ratings; 

assigning content rating vectors to said semantic units 
based on said comparing step; and 

creating a modified data file incorporating rating infor- 
mation derived from said content rating vectors. 

27. The method of claim 26, wherein said step of creating 
a modified data file further comprises the steps of: 

deriving a composite content rating vector for said raw 
data file from said content rating vectors, wherein said 
composite content rating vector comprises a set of 
components each of which is derived from correspond- 
ing components of said content rating vectors; and 

combining said composite content rating vector with said 
raw data file to produce said modified data file. 

28. The method of claim 27, wherein each component of 
said composite content rating vector is a weighted average 
of said corresponding components of said content rating 
vectors, said weighted average including weighting factors 
related to values of said corresponding components of said 
content rating vectors. 

29. The method of claim 27, wherein each component of 
said composite content rating vector is equal to a selected 
value of said corresponding components of said content 
rating vectors, and wherein said selected value is the highest 
of said corresponding components of said content rating 
vectors and said selected value has at least a predetermined 
minimum number of occurrences. 

30. The method of claim 26, wherein said method occurs 
in a server. 

31. The method of claim 26, wherein said raw data file is 
stored in a server and said method occurs in a proxy server. 

32. The method of claim 26, wherein said raw data file is 
stored in a server and said method occurs in a client. 

33. The method of claim 26, wherein said step of creating 
a modified data file comprises the steps of: 

comparing said content rating vectors with preset user 
limit values to identify objectionable semantic units, 
wherein said preset user limit values define objection- 
able content rating vectors; and 

replacing objectionable content corresponding to the 
identified objectionable semantic units in a copy of said 
raw data file with display blocks to produce said 
modified data file. 
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34. The method of claim 33, wherein said preset user limit 
values are stored in a client and said method occurs in a 
server. 

35. The method of claim 33, wherein said preset user limit 
values are stored in a client, said raw data file is stored in a 
server, and said method occurs in a proxy server. 

36. The method of claim 33, wherein said step of creating 
a modified data file further comprises the steps of: 

deriving a modified composite content rating vector for 
said modified data file from a modified set of content 
rating vectors, wherein said modified set of content 
rating vectors does not contain content rating vectors 
corresponding to said objectionable semantic units; and 

storing said modified composite content rating vector in 
said modified data file. 

37. The method of claim 26, wherein said raw data file is 
an audio file and said modified data file is a modified audio 
file, said step of creating a modified data file further com- 
prises the steps of: 

comparing said content rating vectors with preset user 
limit values to identify objectionable semantic units, 
wherein said preset user limit values define objection- 
able content rating vectors; 

using said audio-to-text correlation to locate objection- 
able portions of said audio file corresponding to the 
identified objectionable semantic units; and 

replacing said objectionable portions in a copy of said 
audio file with audio blanking signals to produce said 
modified audio file. 



38. The method of claim 37, wherein said audio file is 
stored in a server and said method occurs in a client. 

39. The method of claim 37, wherein said preset user limit 
values are stored in a client and said method occurs in a 

5 server. 

40. The method of claim 37, wherein said preset user limit 
values are stored in a client, said audio file is stored in a 
server, and said method occurs in a proxy server. 

41. The method of claim 26, wherein said raw data file is 
20 an image file and said modified data file is a modified image 

file, said step of creating a modified data file further com- 
prises the steps of: 

comparing said content rating vectors with preset user 
limit values to identify objectionable discrete objects, 
15 wherein said preset user limit values define objection- 
able content rating vectors; and 
replacing objectionable content corresponding to the 
identified objectionable discrete objects in a copy of 
said image file with image blocks to produce said 
20 modified image file. 

42. The method of claim 41, wherein said image file is 
stored in a server and said method occurs in a client. 

43. The method of claim 41, wherein said preset user limit 
values are stored in a client and said method occurs in a 

25 server. 

44. The method of claim 41, wherein said preset user limit 
values are stored in a client, said image file is stored in a 
server, and said method occurs in a proxy server. 
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